Finding More Bilingual Webpages with High Credibility via Link Analysis

نویسندگان

  • Chengzhi Zhang
  • Xuchen Yao
  • Chunyu Kit
چکیده

This paper presents an efficient approach to finding more bilingual webpage pairs with high credibility via link analysis, using little prior knowledge or heuristics. It extends from a previous algorithm that takes the number of bilingual URL pairs that a key (i.e., a URL pairing pattern) can match as the objective function to search for the best set of keys yielding the greatest number of webpage pairs within targeted bilingual websites. Enhanced algorithms are proposed to match more bilingual webpages following the credibility based on statistical analysis of the link relationship of the seed websites available. With about 12,800 seed websites as test set, the enhanced algorithms improve precision over baseline by more than 5%, from 94.06% to 99.40%, and hence find above 20% more true bilingual URL pairs, illustrating that significantly more bilingual webpages with high credibility can be mined with the help of the link analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web Credibility: Features Exploration and Credibility Prediction

The open nature of the World Wide Web makes evaluating webpage credibility challenging for users. In this paper, we aim to automatically assess web credibility by investigating various characteristics of webpages. Specifically, we first identify features from textual content, link structure, webpages design, as well as their social popularity learned from popular social media sites (e.g., Faceb...

متن کامل

A Statistical Model for Measuring Structural Similarity between Webpages

This paper presents a statistical model for measuring structural similarity between webpages from bilingual websites. Starting from basic assumptions we derive the model and propose an algorithm to estimate its parameters in unsupervised manner. Statistical approach appears to benefit the structural similarity measure: in the task of distinguishing parallel webpages from bilingual websites our ...

متن کامل

Improving the Compression Efficiency for News Web Service Using Semantic Relations Among Webpages

Both compression and decompression play important roles in a web service system. High compression ratio helps to save the storage, while fast decompression contributes to decreasing the response time of service. Specifically focusing on the news web service, this paper proposes a compression mechanism to improve the efficiency of compression and decompression simultaneously by taking advantage ...

متن کامل

Discovering phishing target based on semantic link network

An approach to the discovery of the phishing target of a suspicious webpage is proposed, which is based on construction and reasoning of the Semantic Link Network (SLN) of the suspicious webpage. The SLN is constructed from the given suspicious webpage and its associated webpages. Since reasoning of the SLN can discover implicit relations among webpages, the true association relations between a...

متن کامل

Who Is a Bilingual?

The question of who is and who is not a bilingual is more difficult to answer than it first appears. Bilingualism was long regarded as the equal mastery of two languages, a definition that still prevails in certain glossaries of linguistics. However, today's complex world requires a more exact definition and analysis of the competencies that community members require to interact with speakers o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013